Advice Generation from Observed Execution: Abstract Markov Decision Process Learning
نویسندگان
چکیده
An advising agent, a coach, provides advice to other agents about how to act. In this paper we contribute an advice generation method using observations of agents acting in an environment. Given an abstract state definition and partially specified abstract actions, the algorithm extracts a Markov Chain, infers a Markov Decision Process, and then solves the MDP (given an arbitrary reward signal) to generate advice. We evaluate our work in a simulated robot soccer environment and experimental results show improved agent performance when using the advice generated from the MDP for both a sub-task and the full soccer game.
منابع مشابه
A Learning Approach to Knowledge Acquisition
This paper concerns knowledge acquisition for supporting therapy decision making (TDM) within the formal setting of Markov decision processes (MDP's). It presents a method for learning from medical databases high-level transitions, transition probabilities and action rewards. The method is based state comparison. We also discuss insights in terms of what/when/why/how expert advice is needed to ...
متن کاملRehearsal Based Multi-agent Reinforcment Learning of Decentralized Plans
Decentralized partially-observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Reinforcement learning (RL) based approaches have been recently proposed for distributed solution of Dec-POMDPs ...
متن کاملConcurrent Markov Decision Processes for Robust Robot Team Learning under Uncertainty
For robots to become a more common fixture in private and public industries, they must exhibit compliant individual and social learning. To achieve social compliance, while maintaining individual performance, robots must represent knowledge accurately in both certain and uncertain environments. Robots also need to quantify effective decision making both when isolated and when teamed with peer r...
متن کاملProactive scheduling in distributed computing - A reinforcement learning approach
In distributed computing such as grid computing, online users submit their tasks anytime and anywhere to dynamic resources. Task arrival and execution processes are stochastic. How to adapt to the consequent uncertainties, as well as scheduling overhead and response time, are the main concern in dynamic scheduling. Based on the decision theory, scheduling is formulated as a Markov decision proc...
متن کاملErrata Preface Recent Advances in Hierarchical Reinforcement Learning
Decision Making, Guest Edited by Xi-Ren Cao. The Publisher offers an apology for printing an incorrect version of the paper in the special issue and renders this paper as the true and correct paper. Abstract. Reinforcement learning is bedeviled by the curse of dimensionality: the number of parameters to be learned grows exponentially with the size of any compact encoding of a state. Recent atte...
متن کامل